Search Results: "Ian Jackson"

20 December 2022

Ian Jackson: Rust for the Polyglot Programmer, December 2022 edition

I have reviewed, updated and revised my short book about the Rust programming language, Rust for the Polyglot Programmer. It now covers some language improvements from the past year (noting which versions of Rust they re available in), and has been updated for changes in the Rust library ecosystem. With (further) assistance from Mark Wooding, there is also a new table of recommendations for numerical conversion. Recap about Rust for the Polyglot Programmer There are many introductory materials about Rust. This one is rather different. Compared to much other information about Rust, Rust for the Polyglot Programmer is:

Dense: I assume a lot of starting knowledge. Or to look at it another way: I expect my reader to be able to look up and digest non-Rust-specific words or concepts.
Broad: I cover not just the language and tools, but also the library ecosystem, development approach, community ideology, and so on.
Frank: much material about Rust has a tendency to gloss over or minimise the bad parts. I don t do that. That also frees me to talk about strategies for dealing with the bad parts.
Non-neutral: I m not afraid to recommend particular libraries, for example. I m not afraid to extol Rust s virtues in the areas where it does well.
Terse, and sometimes shallow: I often gloss over what I see as unimportant or fiddly details; instead I provide links to appropriate reference materials.

After reading Rust for the Polyglot Programmer, you won t know everything you need to know to use Rust for any project, but should know where to find it. Comments are welcome of course, via the Dreamwidth comments or Salsa issue or MR. (If you re making a contribution, please indicate your agreement with the Developer Certificate of Origin.)

edited 2022-12-20 01:48 to fix a typo

comments

18 December 2022

Ian Jackson: Rust needs #[throws]

tl;dr: Ok-wrapping as needed in today s Rust is a significant distraction, because there are multiple ways to do it. They are all slightly awkward in different ways, so are least-bad in different situations. You must choose a way for every fallible function, and sometimes change a function from one pattern to another. Rust really needs #[throws] as a first-class language feature. Code using #[throws] is simpler and clearer. Please try out withoutboats s fehler. I think you will like it. Contents

A recent personal experience in coding style Ever since I read withoutboats s 2020 article about fehler, I have been using it in most of my personal projects. For Reasons I recently had a go at eliminating the dependency on fehler from Hippotat. So, I made a branch, deleted the dependency and imports, and started on the whack-a-mole with the compiler errors. After about a half hour of this, I was starting to feel queasy. After an hour I had decided that basically everything I was doing was making the code worse. And, bizarrely, I kept having to make individual decisons about what idiom to use in each place. I couldn t face it any more. After sleeping on the question I decided that Hippotat would be in Debian with fehler, or not at all. Happily the Debian Rust Team generously helped me out, so the answer is that fehler is now in Debian, so it s fine. For me this experience, of trying to convert Rust-with-#[throws] to Rust-without-#[throws] brought the Ok wrapping problem into sharp focus. What is Ok wrapping? Intro to Rust error handling (You can skip this section if you re already a seasoned Rust programer.) In Rust, fallibility is represented by functions that return Result<SuccessValue, Error>: this is a generic type, representing either whatever SuccessValue is (in the Ok variant of the data-bearing enum) or some Error (in the Err variant). For example, std::fs::read_to_string, which takes a filename and returns the contents of the named file, returns Result<String, std::io::Error>. This is a nice and typesafe formulation of, and generalisation of, the traditional C practice, where a function indicates in its return value whether it succeeded, and errors are indicated with an error code. Result is part of the standard library and there are convenient facilities for checking for errors, extracting successful results, and so on. In particular, Rust has the postfix ? operator, which, when applied to a Result, does one of two things: if the Result was Ok, it yields the inner successful value; if the Result was Err, it returns early from the current function, returning an Err in turn to the caller. This means you can write things like this:

    let input_data = std::fs::read_to_string(input_file)?;

and the error handling is pretty automatic. You get a compiler warning, or a type error, if you forget the ?, so you can t accidentally ignore errors. But, there is a downside. When you are returning a successful outcome from your function, you must convert it into a Result. After all, your fallible function has return type Result<SuccessValue, Error>, which is a different type to SuccessValue. So, for example, inside std::fs::read_to_string, we see this:

        let mut string = String::new();
        file.read_to_string(&mut string)?;
        Ok(string)

string has type String; fs::read_to_string must return Result<String, ..>, so at the end of the function we must return Ok(string). This applies to return statements, too: if you want an early successful return from a fallible function, you must write return Ok(whatever). This is particularly annoying for functions that don t actually return a nontrivial value. Normally, when you write a function that doesn t return a value you don t write the return type. The compiler interprets this as syntactic sugar for -> (), ie, that the function returns (), the empty tuple, used in Rust as a dummy value in these kind of situations. A block ( ... ) whose last statement ends in a ; has type (). So, when you fall off the end of a function, the return value is (), without you having to write it. So you simply leave out the stuff in your program about the return value, and your function doesn t have one (i.e. it returns ()). But, a function which either fails with an error, or completes successfuly without returning anything, has return type Result<(), Error>. At the end of such a function, you must explicitly provide the success value. After all, if you just fall off the end of a block, it means the block has value (), which is not of type Result<(), Error>. So the fallible function must end with Ok(()), as we see in the example for std::fs::read_to_string. A minor inconvenience, or a significant distraction? I think the need for Ok-wrapping on all success paths from fallible functions is generally regarded as just a minor inconvenience. Certainly the experienced Rust programmer gets very used to it. However, while trying to remove fehler s #[throws] from Hippotat, I noticed something that is evident in codebases using vanilla Rust (without fehler) but which goes un-remarked. There are multiple ways to write the Ok-wrapping, and the different ways are appropriate in different situations. See the following examples, all taken from a real codebase. (And it s not just me: I do all of these in different places, - when I don t have fehler available - but all these examples are from code written by others.) Idioms for Ok-wrapping - a bestiary Wrap just a returned variable binding If you have the return value in a variable, you can write Ok(reval) at the end of the function, instead of retval.

    pub fn take_until(&mut self, term: u8) -> Result<&'a [u8]>  
        // several lines of code
        Ok(result)

If the returned value is not already bound to variable, making a function fallible might mean choosing to bind it to a variable. Wrap a nontrivial return expression Even if it s not just a variable, you can wrap the expression which computes the returned value. This is often done if the returned value is a struct literal:

    fn take_from(r: &mut Reader<'_>) -> Result<Self>  
        // several lines of code
        Ok(AuthChallenge   challenge, methods  )

Introduce Ok(()) at the end For functions returning Result<()>, you can write Ok(()). This is usual, but not ubiquitous, since sometimes you can omit it. Wrap the whole body If you don t have the return value in a variable, you can wrap the whole body of the function in Ok( ). Whether this is a good idea depends on how big and complex the body is.

    fn from_str(s: &str) -> std::result::Result<Self, Self::Err>  
        Ok(match s  
            "Authority" => RelayFlags::AUTHORITY,
            // many other branches
            _ => RelayFlags::empty(),
         )

Omit the wrap when calling fallible sub-functions If your function wraps another function call of the same return and error type, you don t need to write the Ok at all. Instead, you can simply call the function and not apply ?. You can do this even if your function selects between a number of different sub-functions to call:

    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result  
        if flags::unsafe_logging_enabled()  
            std::fmt::Display::fmt(&self.0, f)
          else  
            self.0.display_redacted(f)

But this doesn t work if the returned error type isn t the same, but needs the autoconversion implied by the ? operator. Convert a fallible sub-function error with Ok( ... ?) If the final thing a function does is chain to another fallible function, but with a different error type, the error must be converted somehow. This can be done with ?.

     fn try_from(v: i32) -> Result<Self, Error>  
         Ok(Percentage::new(v.try_into()?))

Convert a fallible sub-function error with .map_err Or, rarely, people solve the same problem by converting explicitly with .map_err:

     pub fn create_unbootstrapped(self) -> Result<TorClient<R>>  
         // several lines of code
         TorClient::create_inner(
             // several parameters
         )
         .map_err(ErrorDetail::into)

What is to be done, then? The fehler library is in excellent taste and has the answer. With fehler:

Whether a function is fallible, and what it s error type is, is specified in one place. It is not entangled with the main return value type, nor with the success return paths.
So the success paths out of a function are not specially marked with error handling boilerplate. The end of function return value, and the expression after return, are automatically wrapped up in Ok. So the body of a fallible function is just like the body of an infallible one, except for places where error handling is actually involved.
Error returns occur through ? error chaining, and with a new explicit syntax for error return.
We usually talk about the error we are possibly returning, and avoid talking about Result unless we need to.

fehler provides:

An attribute macro #[throws(ErrorType)] to make a function fallible in this way.
A macro throws!(error) for explicitly failing.

This is precisely correct. It is very ergonomic. Consequences include:

One does not need to decide where to put the Ok-wrapping, since it s automatic rather than explicitly written out.
Specifically, what idiom to adopt in the body (for example write!(...)?; vs write!(...) in a formatter) does not depend on whether the error needs converting, how complex the body is, and whether the final expression in the function is itself fallible.
Making an infallible function fallible involves only adding #[throws] to its definition, and ? to its call sites. One does not need to edit the body, or the return type.
Changing the error returned by a function to a suitably compatible different error type does not involve changing the function body.
There is no need for a local Result alias shadowing std::result::Result, which means that when one needs to speak of Result explciitly, the code is clearer.

Limitations of fehler But, fehler is a Rust procedural macro, so it cannot get everything right. Sadly there are some wrinkles.

You can t write #[throws] on a closure.
Sometimes you can get quite poor error messages if you have a sufficiently broken function body.
Code inside a macro call isn t properly visible to fehler so sometimes return statements inside macro calls are untreated. This will lead to a type error, so isn t a correctness hazard, but it can be nuisance if you like other syntax extensions eg if_chain.
#[must_use] #[throws(Error)] fn obtain() -> Thing; ought to mean that Thing must be used, not the Result<Thing, Error>.

But, Rust-with-#[throws] is so much nicer a language than Rust-with-mandatory-Ok-wrapping, that these are minor inconveniences. Please can we have #[throws] in the Rust language This ought to be part of the language, not a macro library. In the compiler, it would be possible to get the all the corner cases right. It would make the feature available to everyone, and it would quickly become idiomatic Rust throughout the community. It is evident from reading writings from the time, particularly those from withoutboats, that there were significant objections to automatic Ok-wrapping. It seems to have become quite political, and some folks burned out on the topic. Perhaps, now, a couple of years later, we can revisit this area and solve this problem in the language itself ? Explicitness An argument I have seen made against automatic Ok-wrapping, and, in general, against any kind of useful language affordance, is that it makes things less explicit. But this argument is fundamentally wrong for Ok-wrapping. Explicitness is not an unalloyed good. We humans have only limited attention. We need to focus that attention where it is actually needed. So explicitness is good in situtions where what is going on is unusual; or would otherwise be hard to read; or is tricky or error-prone. Generally: explicitness is good for things where we need to direct humans attention. But Ok-wrapping is ubiquitous in fallible Rust code. The compiler mechanisms and type systems almost completely defend against mistakes. All but the most novice programmer knows what s going on, and the very novice programmer doesn t need to. Rust s error handling arrangments are designed specifically so that we can avoid worrying about fallibility unless necessary except for the Ok-wrapping. Explicitness about Ok-wrapping directs our attention away from whatever other things the code is doing: it is a distraction. So, explicitness about Ok-wrapping is a bad thing. Appendix - examples showning code with Ok wrapping is worse than code using #[throws] Observe these diffs, from my abandoned attempt to remove the fehler dependency from Hippotat. I have a type alias AE for the usual error type (AE stands for anyhow::Error). In the non-#[throws] code, I end up with a type alias AR<T> for Result<T, AE>, which I think is more opaque but at least that avoids typing out -> Result< , AE> a thousand times. Some people like to have a local Result alias, but that means that the standard Result has to be referred to as StdResult or std::result::Result.

With `fehler` and `#[throws]`	Vanilla Rust, `Result<>`, mandatory `Ok-wrapping`

Return value clearer, error return less wordy:
`impl Parseable for Secret`	`impl Parseable for Secret`
`#[throws(AE)]`
`fn parse(s: Option<&str>) -> Self`	`fn parse(s: Option<&str>) -> AR<Self>`
`let s = s.value()?;`	`let s = s.value()?;`
`if s.is_empty() throw!(anyhow!( secret value cannot be empty ))`	`if s.is_empty() return Err(anyhow!( secret value cannot be empty ))`
`Secret(s.into())`	`Ok(Secret(s.into()))`


No need to wrap whole match statement in `Ok( ):`
`#[throws(AE)]`
`pub fn client<T>(&self, key: & static str, skl: SKL) -> T`	`pub fn client<T>(&self, key: & static str, skl: SKL) -> AR<T>`
`where T: Parseable + Default`	`where T: Parseable + Default`
`match self.end`	`Ok(match self.end`
`LinkEnd::Client => self.ordinary(key, skl)?,`	`LinkEnd::Client => self.ordinary(key, skl)?,`
`LinkEnd::Server => default(),`	`LinkEnd::Server => default(),`
	`)`

Return value and `Ok(())` entirely replaced by `#[throws]`:
`impl Display for Loc`	`impl Display for Loc`
`#[throws(fmt::Error)]`
`fn fmt(&self, f: &mut fmt::Formatter)`	`fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result`
`write!(f, :? : , &self.file, self.lno)?;`	`write!(f, :? : , &self.file, self.lno)?;`
`if let Some(s) = &self.section`	`if let Some(s) = &self.section`
`write!(f, )?;`	`write!(f, )?;`


	`Ok(())`

Call to `write!` now looks the same as in more complex case shown above:
`impl Debug for Secret`	`impl Debug for Secret`
`#[throws(fmt::Error)]`
`fn fmt(&self, f: &mut fmt::Formatter)`	`fn fmt(&self, f: &mut fmt::Formatter)-> fmt::Result`
`write!(f, "Secret(***)")?;`	`write!(f, "Secret(***)")`


Much tiresome `return Ok()` noise removed:
`impl FromStr for SectionName`	`impl FromStr for SectionName`
`type Err = AE;`	`type Err = AE;`
`#[throws(AE)]`
`fn from_str(s: &str) -> Self`	`fn from_str(s: &str) ->AR< Self>`
`match s`	`match s`
`COMMON => return SN::Common,`	`COMMON => return Ok(SN::Common),`
`LIMIT => return SN::GlobalLimit,`	`LIMIT => return Ok(SN::GlobalLimit),`
`_ =>`	`_ =>`
`;`	`;`
`if let Ok(n@ ServerName(_)) = s.parse() return SN::Server(n)`	`if let Ok(n@ ServerName(_)) = s.parse() return Ok(SN::Server(n))`
`if let Ok(n@ ClientName(_)) = s.parse() return SN::Client(n)`	`if let Ok(n@ ClientName(_)) = s.parse() return Ok(SN::Client(n))`

`if client == LIMIT return SN::ServerLimit(server)`	`if client == LIMIT return Ok(SN::ServerLimit(server))`
`let client = client.parse().context( client name in link section name )?;`	`let client = client.parse().context( client name in link section name )?;`
`SN::Link(LinkName server, client )`	`Ok(SN::Link(LinkName server, client ))`

edited 2022-12-18 19:58 UTC to improve, and 2022-12-18 23:28 to fix, formatting

comments

16 November 2022

Ian Jackson: Stop writing Rust linked list libraries!

tl;dr: Don t write a Rust linked list library: they are hard to do well, and usually useless. Use VecDeque, which is great. If you actually need more than VecDeque can do, use one of the handful of libraries that actually offer a significantly more useful API. If you are writing your own data structure, check if someone has done it already, and consider slotmap or generation_arena, (or maybe Rc/Arc). Contents

Survey of Rust linked list libraries I have updated my Survey of Rust linked list libraries. Background In 2019 I was writing plag-mangler, a tool for planar graph layout. I needed a data structure. Naturally I looked for a library to help. I didn t find what I needed, so I wrote rc-dlist-deque. However, on the way I noticed an inordinate number of linked list libraries written in Rust. Most all of these had no real reason for existing. Even the one in the Rust standard library is useless. Results Now I have redone the survey. The results are depressing. In 2019 there were 5 libraries which, in my opinion, were largely useless. In late 2022 there are now thirteen linked list libraries that ought probably not ever to be used. And, a further eight libraries for which there are strictly superior alternatives. Many of these have the signs of projects whose authors are otherwise competent: proper documentation, extensive APIs, and so on. There is one new library which is better for some applications than those available in 2019. (I m referring to generational_token_list, which makes a plausible alternative to dlv-list which I already recommended in 2019.) Why are there so many poor Rust linked list libraries ? Linked lists and Rust do not go well together. But (and I m guessing here) I presume many people are taught in programming school that a linked list is a fundamental data structure; people are often even asked to write one as a teaching exercise. This is a bad idea in Rust. Or maybe they ve heard that writing linked lists in Rust is hard and want to prove they can do it. Double-ended queues One of the main applications for a linked list in a language like C, is a queue, where you put items in at one end, and take them out at the other. The Rust standard library has a data structure for that, VecDeque. Five of the available libraries:

Have an API which is a subset of that of VecDeque: basically, pushing and popping elements at the front and back.
Have worse performance for most applications than VecDeque,
Are less mature, less available, less well tested, etc., than VecDeque, simply because VecDeque is in the Rust Standard Library.

For these you could, and should, just use VecDeque instead. The Cursor concept A proper linked list lets you identify and hold onto an element in the middle of the list, and cheaply insert and remove elements there. Rust s ownership and borrowing rules make this awkward. One idea that people have many times reinvented and reimplemented, is to have a Cursor type, derived from the list, which is a reference to an element, and permits insertion and removal there. Eight libraries have implemented this in the obvious way. However, there is a serious API limitation: To prevent a cursor being invalidated (e.g. by deletion of the entry it points to) you can t modify the list while the cursor exists. You can only have one cursor (that can be used for modification) at a time. The practical effect of this is that you cannot retain cursors. You can make and use such a cursor for a particular operation, but you must dispose of it soon. Attempts to do otherwise will see you losing a battle with the borrow checker. If that s good enough, then you could just use a VecDeque and use array indices instead of the cursors. It s true that deleting or adding elements in the middle involves a lot of copying, but your algorithm is O(n) even with the single-cursor list libraries, because it must first walk the cursor to the desired element. Formally, I believe any algorithm using these exclusive cursors can be rewritten, in an obvious way, to simply iterate and/or copy from the start or end (as one can do with VecDeque) without changing the headline O() performance characteristics. IMO the savings available from avoiding extra copies etc. are not worth the additional dependency, unsafe code, and so on, especially as there are other ways of helping with that (e.g. boxing the individual elements). Even if you don t find that convincing, generational_token_list and dlv_list are strictly superior since they offer a more flexible and convenient API and better performance, and rely on much less unsafe code. Rustic approaches to pointers-to-and-between-nodes data structures Most of the time a VecDeque is great. But if you actually want to hold onto (perhaps many) references to the middle of the list, and later modify it through those references, you do need something more. This is a specific case of a general class of problems where the naive approach (use Rust references to the data structure nodes) doesn t work well. But there is a good solution: Keep all the nodes in an array (a Vec<Option<T>> or similar) and use the index in the array as your node reference. This is fast, and quite ergonomic, and neatly solves most of the problems. If you are concerned that bare indices might cause confusion, as newly inserted elements would reuse indices, add a per-index generation count. These approaches have been neatly packaged up in libraries like slab, slotmap, generational-arena and thunderdome. And they have been nicely applied to linked lists by the authors of generational_token_list. and dlv-list. The alternative for nodey data structures in safe Rust: Rc/Arc Of course, you can just use Rust s interior mutability and reference counting smart pointers, to directly implement the data structure of your choice. In many applications, a single-threaded data structure is fine, in which case Rc and Cell/RefCell will let you write safe code, with cheap refcount updates and runtime checks inserted to defend against unexpected aliasing, use-after-free, etc. I took this approach in rc-dlist-deque, because I wanted each node to be able to be on multiple lists. Rust s package ecosystem demonstrating software s NIH problem The Rust ecosystem is full of NIH libraries of all kinds. In my survey, there are: five good options; seven libraries which are plausible, but just not as good as the alternatives; and fourteen others. There is a whole rant I could have about how the whole software and computing community is pathologically neophilic. Often we seem to actively resist reusing ideas, let alone code; and are ignorant and dismissive of what has gone before. As a result, we keep solving the same problems, badly - making the same mistakes over and over again. In some subfields, working software, or nearly working software, is frequently replaced with something worse, maybe more than once. One aspect of this is a massive cultural bias towards rewriting rather than reusing, let alone fixing and using. Many people can come out of a degree, trained to be a programmer, and have no formal training in selecting and evaluating software; this is even though working effectively with computers requires making good use of everyone else s work. If one isn t taught these skills (when and how to search for prior art, how to choose between dependencies, and so on) one must learn it on the job. The result is usually an ad-hoc and unsystematic approach, often dominated by fashion rather than engineering. The package naming paradox The more experienced and competent programmer is aware of all the other options that exist - after all they have evaluated other choices before writing their own library. So they will call their library something like generational_token_list or vecdeque-stableix. Whereas the novice straight out of a pre-Rust programming course just thinks what they are doing is the one and only obvious thing (even though it s a poor idea) and hasn t even searched for a previous implementation. So they call their package something obvious like linked list . As a result, the most obvious names seem to refer to the least useful libraries.

Edited 2022-11-16 23:55 UTC to update numbers of libraries in various categories following updates to the survey (including updates prompted by feedback received after this post first published).

comments

10 October 2022

Ian Jackson: Skipping releases when upgrading Debian systems

Debian does not officially support upgrading from earlier than the previous stable release: you re not supposed to skip releases. Instead, you re supposed to upgrade to each intervening major release in turn. However, skipping intervening releases does, in fact, often work quite well. Apparently, this is surprising to many people, even Debian insiders. I was encouraged to write about it some more. My personal experience I have three conventionally-managed personal server systems (by which I mean systems which aren t reprovisioned by some kind of automation). Of these at least two have been skip upgraded at least once: The one I don t think I ve skip-upgraded (at least, not recently) is my house network manager (and now VM host) which I try to keep to a minimum in terms of functionality and which I keep quite up to date. It was crossgraded from i386 (32-bit) to amd64 (64-bit) fairly recently, which is a thing that Debian isn t sure it supports. The crossgrade was done a hurry and without any planning, prompted by Spectre et al suddenly requiring big changes to Xen. But it went well enough. My home does random stuff server (media server, web cache, printing, DNS, backups etc.), has etckeeper records starting in 2015. I upgraded directly from jessie (Debian 8) to buster (Debian 10). I think it has probably had earlier skip upgrade(s): the oldest file in /etc is from December 1996 and I have been doing occasional skip upgrades as long as I can remember. And of course there s chiark, which is one of the oldest Debian installs in existence. I wrote about the most recent upgrade, where I went directly from jessie i386 ELTS (32-bit Debian 8) to bulleye amd64 (64-bit Debian 11). That was a very extreme case which required significant planning and pre-testing, since the package dependencies were in no way sufficient for the proper ordering. But, I don t normally go to such lengths. Normally, even on chiark, I just edit the sources.list and see what apt proposes to do. I often skip upgrade chiark because I tend to defer risky-looking upgrades partly in the hope of others fixing the bugs while I wait :-), and partly just because change is disruptive and amortising it is very helpful both to me and my users. I have some records of chiark s upgrades from my announcements to users. As well as the recent skip skip up cross grade, direct , I definitely did a skip upgrade of chiark from squeeze (Debian 6) to jessie (Debian 8). It appears that the previous skip upgrade on chiark was rex (Debian 1.2) to hamm (Debian 2.0). I don t think it s usual for me to choose to do a multi-release upgrade the officially supported way, in two (or more) stages, on a server. I have done that on systems with a GUI desktop setup, but even then I usually skip the intermediate reboot(s). When to skip upgrade (and what precautions to take) I m certainly not saying that everyone ought to be doing this routinely. Most users with a Debian install that is older than oldstable probably ought to reinstall it, or do the two-stage upgrade. Skip upgrading almost always runs into some kind of trouble (albeit, usually trouble that isn t particularly hard to fix if you know what you re doing). However, officially supported non-skip upgrades go wrong too. Doing a two-or-more-releases upgrade via the intermediate releases can expose you to significant bugs in the intermediate releases, which were later fixed. Because Debian s users and downstreams are cautious, and Debian itself can be slow, it is common for bugs to appear for one release and then be fixed only in the next. Paradoxically, this seems to be especially true with the kind of big and scary changes where you d naively think the upgrade mechanisms would break if you skipped the release where the change first came in. I would not recommend a skip upgrade to someone who is not a competent Debian administrator, with good familiarity with Debian package management, including use of dpkg directly to fix things up. You should have a mental toolkit of manual bug workaround techniques. I always should make sure that I have rescue media (and in the case of a remote system, full remote access including ability to boot a different image), although I don t often need it. And, when considering a skip upgrade, you should be aware of the major changes that have occurred in Debian. Skip upgrading is more likely to be a good idea with a complex and highly customised system: a fairly vanilla install is not likely to encounter problems during a two-stage update. (And, a vanilla system can be upgraded by reinstalling.) I haven t recently skip upgraded a laptop or workstation. I doubt I would attempt it; modern desktop software seems to take a much harder line about breaking things that are officially unsupported, and generally trying to force everyone into the preferred mold. A request to Debian maintainers I would like to encourage Debian maintainers to defer removing upgrade compatibility machinery until it is actually getting in the way, or has become hazardous, or many years obsolete. Examples of the kinds of things which it would be nice to keep, and which do not usually cause much inconvenience to retain, are dependency declarations (particularly, alternatives), and (many) transitional fragments in maintainer scripts. If you find yourself needing to either delete some compatibility feature, or refactor/reorganise it, I think it is probably best to delete it. If you modify it significantly, the resulting thing (which won t be tested until someone uses it in anger) is quite likely to have bugs which make it go wrong more badly (or, more confusingly) than the breakage that would happen without it. Obviously this is all a judgement call. I m not saying Debian should formally support skip upgrades, to the extent of (further) slowing down important improvements. Nor am I asking for any change to the routine approach to (for example) transitional packages (i.e. the technique for ensuring continuity of installation when a package name changes). We try to make release upgrades work perfectly; but skip upgrades don t have to work perfectly to be valuable. Retaining compatibility code can also make it easier to provide official backports, and it probably helps downstreams with different release schedules. The fact that maintainers do in practice often defer removing compatibility code provides useful flexibility and options to at least some people. So it would be nice if you d at least not go out of your way to break it. Building on older releases I would also like to encourage maintainers to provide source packages in Debian unstable that will still build on older releases, where this isn t too hard and the resulting binaries might be basically functional. Speaking personally, it s not uncommon for me to rebuild packages from unstable and install them on much older releases. This is another thing that is not officially supported, but which often works well. I m not saying to contort your build system, or delay progress. You ll definitely want to depend on a recentish debhelper. But, for example, retaining old build-dependency alternatives is nice. Retaining them doesn t constitute a promise that it works - it just makes life slightly easier for someone who is going off piste. If you know you have users on multiple distros or multiple releases, and wish to fully support them, you can go further, of course. Many of my own packages are directly buildable, or even directly installable, on older releases.

comments

28 September 2022

Ian Jackson: Hippotat (IP over HTTP) - first advertised release

I have released version 1.0.0 of Hippotat, my IP-over-HTTP system. To quote the README:

You re in a cafe or a hotel, trying to use the provided wifi. But it s not working. You discover that port 80 and port 443 are open, but the wifi forbids all other traffic.

Never mind, start up your hippotat client. Now you have connectivity. Your VPN and SSH and so on run over Hippotat. The result is not very efficient, but it does work.

Story In early 2017 I was in a mountaintop cafeteria, hoping to do some work on my laptop. (For Reasons I couldn t go skiing that day.) I found that local wifi was badly broken: It had a severe port block. I had to use my port 443 SSH server to get anywhere. My usual arrangements punt everything over my VPN, which uses UDP of course, and I had to bodge several things. Using a web browser directly only the wifi worked normally, of course - otherwise the other guests would have complained. This was not the first experience like this I d had, but this time I had nothing much else to do but fix it. In a few furious hacking sessions, I wrote Hippotat, a tool for making my traffic look enough like ordinary web browsing that it gets through most stupid firewalls. That Python version of Hippotat served me well for many years, despite being rather shonky, extremely inefficient in CPU (and therefore battery) terms and not very productised. But recently things have started to go wrong. I was using Twisted Python and there was what I think must be some kind of buffer handling bug, which started happening when I upgraded the OS (getting newer versions of Python and the Twisted libraries). The Hippotat code, and the Twisted APIs, were quite convoluted, and I didn t fancy debugging it. So last year I rewrote it in Rust. The new Rust client did very well against my existing servers. To my shame, I didn t get around to releasing it. However, more recently I upgraded the server hosts my Hippotat daemons run on to recent Debian releases. They started to be affected by the bug too, rendering my Rust client unuseable. I decided I had to deploy the Rust server code. This involved some packaging work. Having done that, it s time to release it: Hippotat 1.0.0 is out. The package build instructions are rather strange My usual approach to releasing something like this would be to provide a git repository containing a proper Debian source package. I might also build binaries, using sbuild, and I would consider actually uploading to Debian. However, despite me taking a fairly conservative approach to adding dependencies to Hippotat, still a couple of the (not very unusual) Rust packages that Hippotat depends on are not in Debian. Last year I considered tackling this head-on, but I got derailed by difficulties with Rust packaging in Debian. Furthermore, the version of the Rust compiler itself in Debian stable is incapable of dealing with recent versions of very many upstream Rust packages, because many packages most recent versions now require the 2021 Edition of Rust. Sadly, Rust s package manager, cargo, has no mechanism for trying to choose dependency versions that are actually compatible with the available compiler; efforts to solve this problem have still not borne the needed fruit. The result is that, in practice, currently Hippotat has to be built with (a) a reasonably recent Rust toolchain such as found in Debian unstable or obtained from Rust upstream; (b) dependencies obtained from the upstream Rust repository. At least things aren t completely terrible: Rustup itself, despite its alarming install rune, has a pretty good story around integrity, release key management and so on. And with the right build rune, cargo will check not just the versions, but the precise content hashes, of the dependencies to be obtained from crates.io, against the information I provide in the Cargo.lock file. So at least when you build it you can be sure that the dependencies you re getting are the same ones I used myself when I built and tested Hippotat. And there s only 147 of them (counting indirect dependencies too), so what could possibly go wrong? Sadly the resulting package build system cannot work with Debian s best tool for doing clean and controlled builds, sbuild. Under the circumstances, I don t feel I want to publish any binaries.

comments

27 September 2022

Steve McIntyre: Firmware again - updates, how I'm voting and why!

Updates Back in April I wrote about issues with how we handle firmware in Debian, and I also spoke about it at DebConf in July. Since then, we've started the General Resolution process - this led to a lot of discussion on the the debian-vote mailing list and we're now into the second week of the voting phase. The discussion has caught the interest of a few news sites along the way:

Debian to vote on its firmware path in Linux Weekly News
Debian Begins A General Resolution To Decide What To Do With Non-Free Firmware in Phoronix
Debian set to start vote on including non-free packages in install media in ITWire
Why Debian Might Include Non-Free Firmware in Future Releases in MakeUseOf

My vote I've also had several people ask me how I'm voting myself, as I started this GR in the first place. I'm happy to oblige! Here's my vote, sorted into preference order:

  [1] Choice 5: Change SC for non-free firmware in installer, one installer
  [2] Choice 1: Only one installer, including non-free firmware
  [3] Choice 6: Change SC for non-free firmware in installer, keep both installers
  [4] Choice 2: Recommend installer containing non-free firmware
  [5] Choice 3: Allow presenting non-free installers alongside the free one
  [6] Choice 7: None Of The Above
  [7] Choice 4: Installer with non-free software is not part of Debian

Why have I voted this way? Fundamentally, my motivation for starting this vote was to ask the project for clear positive direction on a sensible way forward with non-free firmware support. Thus, I've voted all of the options that do that above NOTA. On those terms, I don't like Choice 4 here - IMHO it leaves us in the same unclear situation as before. I'd be happy for us to update the Social Contract for clarity, and I know some people would be much more comfortable if we do that explicitly here. Choice 1 was my initial personal preference as we started the GR, but since then I've been convinced that also updating the SC would be a good idea, hence Choice 5. I'd also rather have a single image / set of images produced, for the two reasons I've outlined before. It's less work for our images team to build and test all the options. But, much more importantly: I believe it's less likely to confuse new users. I appreciate that not everybody agrees with me here, and this is part of the reason why we're voting! Other Debian people have also blogged about their voting choices (Gunnar Wolf and Ian Jackson so far), and I thank them for sharing their reasoning too. For the avoidance of doubt: my goal for this vote was simply to get a clear direction on how to proceed here. Although I proposed Choice 1 (Only one installer, including non-free firmware), I also seconded several of the other ballot options. Of course I will accept the will of the project when the result is announced - I'm not going to do anything silly like throw a tantrum or quit the project over this! Finally If you're a DD and you haven't voted already, please do so - this is an important choice for the Debian project.

24 September 2022

Ian Jackson: Please vote in favour of the Debian Social Contract change

tl;dr: Please vote in favour of the Debian Social Contract change, by ranking all of its options above None of the Above. Rank the SC change options above corresponding options that do not change the Social Contract. Vote to change the SC even if you think the change is not necessary for Debian to prominently/officially provide an installer with-nonfree-firmware. Why vote for SC change even if I think it s not needed? I m addressing myself primarily to the reader who agrees with me that Debian ought to be officially providing with-firmware images. I think it is very likely that the winning option will be one of the ones which asks for an official and prominent with-firmware installer. However, many who oppose this change believe that it would be a breach of Debian s Social Contract. This is a very reasonable and arguable point of view. Indeed, I m inclined to share it. If the winning option is to provide a with-firmware installer (perhaps, only a with-firmware installer) those people will feel aggrieved. They will, quite reasonably, claim that the result of the vote is illegitimate - being contrary to Debian s principles as set out in the Social Contract, which require a 3:1 majority to change. There is even the possibility that the Secretary may declare the GR result void, as contrary to the Constitution! (Sadly, I am not making this up.) This would cast Debian into (yet another) acrimonious constitutional and governance crisis. The simplest answer is to amend the Social Contract to explicitly permit what is being proposed. Holger s option F and Russ s option E do precisely that. Amending the SC is not an admission that it was legally necessary to do so. It is practical politics: it ensures that we have clear authority and legitimacy. Aren t we softening Debian s principles? I think prominently distributing an installer that can work out of the box on the vast majority of modern computers would help Debian advance our users freedom. I see user freedom as a matter of practical capability, not theoretical purity. Anyone living in the modern world must make compromises. It is Debian s job to help our users (and downstreams) minimise those compromises and retain as much control as possible over the computers in their life. Insisting that a user buys different hardware, or forcing them to a different distro, does not serve that goal. I don t really expect to convince anyone with such a short argument, but I do want to make the point that providing an installer that users can use to obtain a lot of practical freedom is also, for many of us, a matter of principle.

comments

23 August 2022

Ian Jackson: prefork-interp - automatic startup time amortisation for all manner of scripts

The problem I had - Mason, so, sadly, FastCGI Since the update to current Debian stable, the website for YARRG, (a play-aid for Puzzle Pirates which I wrote some years ago), started to occasionally return Internal Server Error , apparently due to bug(s) in some FastCGI libraries. I was using FastCGI because the website is written in Mason, a Perl web framework, and I found that Mason CGI calls were slow. I m using CGI - yes, trad CGI - via userv-cgi. Running Mason this way would compile the template for each HTTP request just when it was rendered, and then throw the compiled version away. The more modern approach of an application server doesn t scale well to a system which has many web applications most of which are very small. The admin overhead of maintaining a daemon, and corresponding webserver config, for each such service would be prohibitive, even with some kind of autoprovisioning setup. FastCGI has an interpreter wrapper which seemed like it ought to solve this problem, but it s quite inconvenient, and often flaky. I decided I could do better, and set out to eliminate FastCGI from my setup. The result seems to be a success; once I d done all the hard work of writing prefork-interp, I found the result very straightforward to deploy. prefork-interp prefork-interp is a small C program which wraps a script, plus a scripting language library to cooperate with the wrapper program. Together they achieve the following:

Startup cost of the script (loading modules it uses, precompuations, loading and processing of data files, etc.) is paid once, and reused for subsequent invocations of the same script.
Minimal intervention to the script source code:
- one new library to import
- one new call to make from that library, right after the script intialisation is complete
- change to the #! line.
The new initialisation complete call turns the program into a little server (a daemon), and then returns once for each actual invocation, each time in a fresh grandchild process.

Features:

Seamless reloading on changes to the script source code (automatic, and configurable).
Concurrency limiting.
Options for distinguishing different configurations of the same script so that they get a server each.
You can run the same script standalone, as a one-off execution, as well as under prefork-interp.
Currently, a script-side library is provided for Perl. I m pretty sure Python would be fairly straightforward.

Important properties not always satisfied by competing approaches:

Error output (stderr) and exit status from both phases of the script code execution faithfully reproduced to the calling context. Environment, arguments, and stdin/stdout/stderr descriptors, passed through to each invocation.
No polling, other than a long-term idle timeout, so good on laptops (or phones).
Automatic lifetime management of the per-script server, including startup and cleanup. No integration needed with system startup machinery: No explicit management of daemons, init scripts, systemd units, cron jobs, etc.
Useable right away without fuss for CGI programs but also for other kinds of program invocation.
(I believe) reliable handling of unusual states arising from failed invocations or races.

Swans paddling furiously The implementation is much more complicated than the (apparent) interface. I won t go into all the details here (there are some terrifying diagrams in the source code if you really want), but some highlights: We use an AF_UNIX socket (hopefully in /run/user/UID, but in ~ if not) for rendezvous. We can try to connect without locking, but we must protect the socket with a separate lockfile to avoid two concurrent restart attempts. We want stderr from the script setup (pre-initialisation) to be delivered to the caller, so the script ought to inherit our stderr and then will need to replace it later. Twice, in fact, because the daemonic server process can t have a stderr. When a script is restarted for any reason, any old socket will be removed. We want the old server process to detect that and quit. (If hung about, it would wait for the idle timeout; if this happened a lot - eg, a constantly changing set of services - we might end up running out of pids or something.) Spotting the socket disappearing, without polling, involves use of a library capable of using inotify (or the equivalent elsewhere). Choosing a C library to do this is not so hard, but portable interfaces to this functionality can be hard to find in scripting languages, and also we don t want every language binding to have to reimplement these checks. So for this purpose there s a little watcher process, and associated IPC. When an invoking instance of prefork-interp is killed, we must arrange for the executing service instance to stop reading from its stdin (and, ideally, writing its stdout). Otherwise it s stealing input from prefork-interp s successors (maybe the user s shell)! Cleanup ought not to depend on positive actions by failing processes, so each element of the system has to detect failures of its peers by means such as EOF on sockets/pipes. Obtaining prefork-interp I put this new tool in my chiark-utils package, which is a collection of useful miscellany. It s available from git. Currently I make releases by uploading to Debian, where prefork-interp has just hit Debian unstable, in chiark-utils 7.0.0. Support for other scripting languages I would love Python to be supported. If any pythonistas reading this think you might like to help out, please get in touch. The specification for the protocol, and what the script library needs to do, is documented in the source code Future plans for chiark-utils chiark-utils as a whole is in need of some tidying up of its build system and packaging. I intend to try to do some reorganisation. Currently I think it would be better to organising the source tree more strictly with a directory for each included facility, rather than grouping compiled and scripts together. The Debian binary packages should be reorganised more fully according to their dependencies, so that installing a program will ensure that it works. I should probably move the official git repo from my own git+gitweb to a forge (so we can have MRs and issues and so on). And there should be a lot more testing, including Debian autopkgtests.

edited 2022-08-23 10:30 +01:00 to improve the formatting

comments

8 August 2022

Ian Jackson: dkim-rotate - rotation and revocation of DKIM signing keys

Background Internet email is becoming more reliant on DKIM, a scheme for having mail servers cryptographically sign emails. The Big Email providers have started silently spambinning messages that lack either DKIM signatures, or SPF. DKIM is arguably less broken than SPF, so I wanted to deploy it. But it has a problem: if done in a naive way, it makes all your emails non-repudiable, forever. This is not really a desirable property - at least, not desirable for you, although it can be nice for someone who (for example) gets hold of leaked messages obtained by hacking mailboxes. This problem was described at some length in Matthew Green s article Ok Google: please publish your DKIM secret keys. Following links from that article does get you to a short script to achieve key rotation but it had a number of problems, and wasn t useable in my context. dkim-rotate So I have written my own software for rotating and revoking DKIM keys: dkim-rotate. I think it is a good solution to this problem, and it ought to be deployable in many contexts (and readily adaptable to those it doesn t already support). Here s the feature list taken from the README:

Leaked emails become unattestable (plausibily deniable) within a few days soon after the configured maximum email propagation time.
Mail domain DNS configuration can be static, and separated from operational DKIM key rotation. Domain owner delegates DKIM configuration to mailserver administrator, so that dkim-rotate does not need to edit your mail domain s zone.
When a single mail server handles multiple mail domains, only a single dkim-rotate instance is needed.
Supports situations where multiple mail servers may originate mails for a single mail domain.
DNS zonefile remains small; old keys are published via a webserver, rather than DNS.
Supports runtime (post-deployment) changes to tuning parameters and configuration settings. Careful handling of errors and out-of-course situations.
Convenient outputs: a standard DNS zonefile; easily parseable settings for the MTA; and, a directory of old keys directly publishable by a webserver.

Complications It seems like it should be a simple problem. Keep N keys, and every day (or whatever), generate and start using a new key, and deliberately leak the oldest private key. But, things are more complicated than that. Considerably more complicated, as it turns out. I didn t want the DKIM key rotation software to have to edit the actual DNS zones for each relevant mail domain. That would tightly entangle the mail server administration with the DNS administration, and there are many contexts (including many of mine) where these roles are separated. The solution is to use DNS aliases (CNAME). But, now we need a fixed, relatively small, set of CNAME records for each mail domain. That means a fixed, relatively small set of key identifiers ( selectors in DKIM terminology), which must be used in rotation. We don t want the private keys to be published via the DNS because that makes an ever-growing DNS zone, which isn t great for performance; and, because we want to place barriers in the way of processes which might enumerate the set of keys we use (and the set of keys we have leaked) and keep records of what status each key had when. So we need a separate publication channel - for which a webserver was the obvious answer. We want the private keys to be readily noticeable and findable by someone who is verifying an alleged leaked email dump, but to be hard to enumerate. (One part of the strategy for this is to leave a note about it, with the prospective private key url, in the email headers.) The key rotation operations are more complicated than first appears, too. The short summary, above, neglects to consider the fact that DNS updates have a nonzero propagation time: if you change the DNS, not everyone on the Internet will experience the change immediately. So as well as a timeout for how long it might take an email to be delivered (ie, how long the DKIM signature remains valid), there is also a timeout for how long to wait after updating the DNS, before relying on everyone having got the memo. (This same timeout applies both before starting to sign emails with a new key, and before deliberately compromising a key which has been withdrawn and deadvertised.) Updating the DNS, and the MTA configuration, are fallible operations. So we need to cope with out-of-course situations, where a previous DNS or MTA update failed. In that case, we need to retry the failed update, and not proceed with key rotation. We mustn t start the timer for the key rotation until the update has been implemented. The rotation script will usually be run by cron, but it might be run by hand, and when it is run by hand it ought not to jump the gun and do anything too early (ie, before the relevant timeout has expired). cron jobs don t always run, and don t always run at precisely the right time. (And there s daylight saving time, to consider, too.) So overall, it s not sufficient to drive the system via cron and have it proceed by one unit of rotation on each run. And, hardest of all, I wanted to support post-deployment configuration changes, while continuing to keep the whole the system operational. Otherwise, you have to bake in all the timing parameters right at the beginning and can t change anything ever. So for example, I wanted to be able to change the email and DNS propagation delays, and even the number of selectors to use, without adversely affecting the delivery of already-sent emails, and without having to shut anything down. I think I have solved these problems. The resulting system is one which keeps track of the timing constraints, and the next event which might occur, on a per-key basis. It calculates on each run, which key(s) can be advanced to the next stage of their lifecycle, and performs the appropriate operations. The regular key update schedule is then an emergent property of the config parameters and cron job schedule. (I provide some example config.) Exim Integrating dkim-rotate itself with Exim was fairly easy. The lsearch lookup function can be used to fish information out of a suitable data file maintained by dkim-rotate. But a final awkwardness was getting Exim to make the right DKIM signatures, at the right time. When making a DKIM signature, one must choose a signing authority domain name: who should we claim to be? (This is the SDID in DKIM terms.) A mailserver that handles many different mail domains will be able to make good signatures on behalf of many of them. It seems to me that domain to be the mail domain in the From: header of the email. (The RFC doesn t seem to be clear on what is expected.) Exim doesn t seem to have anything builtin to do that. And, you only want to DKIM-sign emails that are originated locally or from trustworthy sources. You don t want to DKIM-sign messages that you received from the global Internet, and are sending out again (eg because of an email alias or mailing list). In theory if you verify DKIM on all incoming emails, you could avoid being fooled into signing bad emails, but rejecting all non-DKIM-verified email would be a very strong policy decision. Again, Exim doesn t seem to have cooked machinery. The resulting Exim configuration parameters run to 22 lines, and because they re parameters to an existing config item (the smtp transport) they can t even easily be deployed as a drop-in file via Debian s split config Exim configuration scheme. (I don t know if the file written for Exim s use by dkim-rotate would be suitable for other MTAs, but this part of dkim-rotate could easily be extended.) Conclusion I have today released dkim-rotate 0.4, which is the first public release for general use. I have it deployed and working, but it s new so there may well be bugs to work out. If you would like to try it out, you can get it via git from Debian Salsa. (Debian folks can also find it freshly in Debian unstable.)

comments

30 July 2022

Ian Jackson: chiark s skip-skip-cross-up-grade

Two weeks ago I upgraded chiark from Debian jessie i386 to bullseye amd64, after nearly 30 years running Debian i386. This went really quite well, in fact! Background chiark is my colo - a server I run, which lives in a data centre in London. It hosts ~200 users with shell accounts, various websites and mailing lists, moderators for a number of USENET newsgroups, and countless other services. chiark s internal setup is designed to enable my users to do a maximum number of exciting things with a minimum of intervention from me. chiark s OS install dates to 1993, when I installed Debian 0.93R5, the first version of Debian to advertise the ability to be upgraded without reinstalling. I think that makes it one of the oldest Debian installations in existence. Obviously it s had several new hardware platforms too. (There was a prior install of Linux on the initial hardware, remnants of which can maybe still be seen in some obscure corners of chiark s /usr/local.) chiark s install is also at the very high end of the installation complexity, and customisation, scale: reinstalling it completely would be an enormous amount of work. And it s unique. chiark s upgrade history chiark s last major OS upgrade was to jessie (Debian 8, released in April 2015). That was in 2016. Since then we have been relying on Debian s excellent security support posture, and the Debian LTS and more recently Freexian s Debian ELTS projects and some local updates, The use of ELTS - which supports only a subset of packages - was particularly uncomfortable. Additionally, chiark was installed with 32-bit x86 Linux (Debian i386), since that was what was supported and available at the time. But 32-bit is looking very long in the tooth. Why do a skip upgrade So, I wanted to move to the fairly recent stable release - Debian 11 (bullseye), which is just short of a year old. And I wanted to crossgrade (as its called) to 64-bit. In the past, I have found I have had greater success by doing direct upgrades, skipping intermediate releases, rather than by following the officially-supported path of going via every intermediate release. Doing a skip upgrade avoids exposure to any packaging bugs which were present only in intermediate release(s). Debian does usually fix bugs, but Debian has many cautious users, so it is not uncommon for bugs to be found after release, and then not be fixed until the next one. A skip upgrade avoids the need to try to upgrade to already-obsolete releases (which can involve messing about with multiple snapshots from snapshot.debian.org. It is also significantly faster and simpler, which is important not only because it reduces downtime, but also because it removes opportunities (and reduces the time available) for things to go badly. One downside is that sometimes maintainers aggressively remove compatibility measures for older releases. (And compatibililty packages are generally removed quite quickly by even cautious maintainers.) That means that the sysadmin who wants to skip-upgrade needs to do more manual fixing of things that haven t been dealt with automatically. And occasionally one finds compatibility problems that show up only when mixing very old and very new software, that no-one else has seen. Crossgrading Crossgrading is fairly complex and hazardous. It is well supported by the low level tools (eg, dpkg) but the higher-level packaging tools (eg, apt) get very badly confused. Nowadays the system is so complex that downloading things by hand and manually feeding them to dpkg is impractical, other than as a very occasional last resort. The approach, generally, has been to set the system up to want to be the new architecture, run apt in a download-only mode, and do the package installation manually, with some fixing up and retrying, until the system is coherent enough for apt to work. This is the approach I took. (In current releases, there are tools that will help but they are only in recent releases and I wanted to go direct. I also doubted that they would work properly on chiark, since it s so unusual.) Peril and planning Overall, this was a risky strategy to choose. The package dependencies wouldn t necessarily express all of the sequencing needed. But it still seemed that if I could come up with a working recipe, I could do it. I restored most of one of chiark s backups onto a scratch volume on my laptop. With the LVM snapshot tools and chroots. I was able to develop and test a set of scripts that would perform the upgrade. This was a very effective approach: my super-fast laptop, with local caches of the package repositories, was able to do many edit, test, debug cycles. My recipe made heavy use of snapshot.debian.org, to make sure that it wouldn t rot between testing and implementation. When I had a working scheme, I told my users about the planned downtime. I warned everyone it might take even 2 or 3 days. I made sure that my access arrangemnts to the data centre were in place, in case I needed to visit in person. (I have remote serial console and power cycler access.) Reality - the terrible rescue install My first task on taking the service down was the check that the emergency rescue installation worked: chiark has an ancient USB stick in the back, which I can boot to from the BIOS. The idea being that many things that go wrong could be repaired from there. I found that that install was too old to understand chiark s storage arrangements. mdadm tools gave very strange output. So I needed to upgrade it. After some experiments, I rebooted back into the main install, bringing chiark s service back online. I then used the main install of chiark as a kind of meta-rescue-image for the rescue-image. The process of getting the rescue image upgraded (not even to amd64, but just to something not totally ancient) was fraught. Several times I had to rescue it by copying files in from the main install outside. And, the rescue install was on a truly ancient 2G USB stick which was terribly terribly slow, and also very small. I hadn t done any significant planning for this subtask, because it was low-risk: there was little way to break the main install. Due to all these adverse factors, sorting out the rescue image took five hours. If I had known how long it would take, at the beginning, I would have skipped it. 5 hours is more than it would have taken to go to London and fix something in person. Reality - the actual core upgrade I was able to start the actual upgrade in the mid-afternoon. I meticulously checked and executed the steps from my plan. The terrifying scripts which sequenced the critical package updates ran flawlessly. Within an hour or so I had a system which was running bullseye amd64, albeit with many important packages still missing or unconfigured. So I didn t need the rescue image after all, nor to go to the datacentre. Fixing all the things Then I had to deal with all the inevitable fallout from an upgrade. Notable incidents: exim4 has a new tainting system This is to try to help the sysadmin avoid writing unsafe string interpolations. ( Little Bobby Tables. ) This was done by Exim upstream in a great hurry as part of a security response process. The new checks meant that the mail configuration did not work at all. I had to turn off the taint check completely. I m fairly confident that this is correct, because I am hyper-aware of quoting issues and all of my configuration is written to avoid the problems that tainting is supposed to avoid. One particular annoyance is that the approach taken for sqlite lookups makes it totally impossible to use more than one sqlite database. I think the sqlite quoting operator which one uses to interpolate values produces tainted output? I need to investigate this properly. LVM now ignores PVs which are directly contained within LVs by default chiark has LVM-on-RAID-on-LVM. This generally works really well. However, there was one edge case where I ended up without the intermediate RAID layer. The result is LVM-on-LVM. But recent versions of the LVM tools do not look at PVs inside LVs, by default. This is to help you avoid corrupting the state of any VMs you have on your system. I didn t know that at the time, though. All I knew was that LVM was claiming my PV was unusable , and wouldn t explain why. I was about to start on a thorough reading of the 15,000-word essay that is the commentary in the default /etc/lvm/lvm.conf to try to see if anything was relevant, when I received a helpful tipoff on IRC pointing me to the scan_lvs option. I need to file a bug asking for the LVM tools to explain why they have declared a PV unuseable. apache2 s default config no longer read one of my config files I had to do a merge (of my changes vs the maintainers changes) for /etc/apache2/apache2.conf. When doing this merge I failed to notice that the file /etc/apache2/conf.d/httpd.conf was no longer included by default. My merge dropped that line. There were some important things in there, and until I found this the webserver was broken. dpkg --skip-same-version DTWT during a crossgrade (This is not a fix all the things - I found it when developing my upgrade process.) When doing a crossgrade, one often wants to say to dpkg install all these things, but don t reinstall things that have already been done . That s what --skip-same-version is for. However, the logic had not been updated as part of the work to support multiarch, so it was wrong. I prepared a patched version of dpkg, and inserted it in the appropriate point in my prepared crossgrade plan. The patch is now filed as bug #1014476 against dpkg upstream Mailman Mailman is no longer in bullseye. It s only available in the previous release, buster. bullseye has Mailman 3 which is a totally different system - requiring basically, a completely new install and configuration. To even preserve existing archive links (a very important requirement) is decidedly nontrivial. I decided to punt on this whole situation. Currently chiark is running buster s version of Mailman. I will have to deal with this at some point and I m not looking forward to it. Python Of course that Mailman is Python 2. The Python project s extremely badly handled transition includes a recommendation to change the meaning of #!/usr/bin/python from Python 2, to Python 3. But Python 3 is a new language, barely compatible with Python 2 even in the most recent iterations of both, and it is usual to need to coinstall them. Happily Debian have provided the python-is-python2 package to make things work sensibly, albeit with unpleasant imprecations in the package summary description. USENET news Oh my god. INN uses many non-portable data formats, which just depend on your C types. And there are complicated daemons, statically linked libraries which cache on-disk data, and much to go wrong. I had numerous problems with this, and several outages and malfunctions. I may write about that on a future occasion.

(edited 2022-07-20 11:36 +01:00 and 2022-07-30 12:28+01:00 to fix typos)

comments

2 April 2022

Ian Jackson: Otter (game server) 1.0.0 released

I have just released Otter 1.0.0. Recap: what is Otter Otter is my game server for arbitrary board games. Unlike most online game systems. It does not know (nor does it need to know) the rules of the game you are playing. Instead, it lets you and your friends play with common tabletop/boardgame elements such as hands of cards, boards, and so on. So it s something like a tabletop simulator (but it does not have any 3D, or a physics engine, or anything like that). There are provided game materials and templates for Penultima, Mao, and card games in general. Otter also supports uploadable game bundles, which allows users to add support for additional games - and this can be done without programming. For more information, see the online documentation. There are a longer intro and some screenshots in my 2021 introductory blog post about Otter Releasing 1.0.0 I m calling this release 1.0.0 because I think I can now say that its quality, reliability and stability is suitable for general use. In particular, Otter now builds on Stable Rust, which makes it a lot easier to install and maintain. Switching web framework, and async Rust I switched Otter from the Rocket web framework to Actix. There are things to prefer about both systems, and I still have a soft spot for Rocket. But ultimately I needed a framework which was fully released and supported for use with Stable Rust. There are few if any Rust web frameworks that are not async. This is rather a shame. Async Rust is a considerably more awkward programming environment than ordinary non-async Rust. I don t want to digress into a litany of complaints, but suffice it to say that while I really love Rust, my views on async Rust are considerably more mixed. Future plans In the near future I plan to add a couple of features to better support some particular games: currency-like resources, and a better UI for dice-like randomness. In the longer term, Otter s, installation and account management arrangements are rather unsophisticated and un-webby. There is not currently any publicly available instance for you to try it out without installing it on a machine of your own. There s not even any provided binaries: you must built Otter yourself. I hope to be able to improve this situation but it involves dealing with cloud CI and containers and so-on, which can all be rather unpleasant. Users on chiark will find an instance of Otter there.

comments

3 March 2022

Ian Jackson: 3D printed hard case for Fairphone 4

About 4 years ago, I posted about making a 3D printed case for my then-new phone. The FP2 was already a few years old when I got one and by now, some spares are unavailable - which is a problem, because I'm terribly hard on hardware. Indeed, that's why I need a very sturdy case for my phone - a case which can be ablative when necessary. With the arrival of my new Fairphone 4, I've updated my case design. Sadly the FP4 doesn't have a notification LED - I guess we're supposed to be glued to the screen and leaving the phone ignored in a corner unless it lights up is forbidden. But that does at least make the printing simpler, as there's no need for a window for the LED. Source code: https://www.chiark.greenend.org.uk/ucgi/~ianmdlvl/git?p=reprap-play.git;a=blob;f=fairphone4-case.scad;h=1738612c2aafcd4ee4ea6b8d1d14feffeba3b392;hb=629359238b2938366dc6e526d30a2a7ddec5a1b0 And the diagrams (which are part of the source, although I didn't update them for the FP4 changes: https://www.chiark.greenend.org.uk/ucgi/~ianmdlvl/git?p=reprap-diagrams.git;a=tree;f=fairphone-case;h=65f423399cbcfd3cf24265ed3216e6b4c0b26c20;hb=07e1723c88a294d68637bb2ca3eac388d2a0b5d4 ( big pictures )

comments

23 February 2022

Ian Jackson: Rooting an Eos Fairphone 4

Last week I received (finally) my Fairphone 4, supplied with a de-googled operating system, which I had ordered from the E Foundation s shop in December. (I m am very hard on hardware and my venerable Fairphone 2 is really on its last legs.) I expect to have full control over the software on any computing device I own which is as complicated, capable, and therefore, hazardous, as a mobile phone. Unfortunately the Eos image (they prefer to spell it /e/ os , srsly!) doesn t come with a way to get root without taking fairly serious measures including unlocking the bootloader. Unlocking the bootloader wouldn t be desirable for me but I can t live without root. So. I started with these helpful instructions: https://forum.xda-developers.com/t/fairphone-4-root.4376421/ I found the whole process a bit of a trial, and I thought I would write down what I did. But, it s not straightforward, at least for someone like me who only has a dim understanding of all this Android stuff. Unfortunately, due to the number of missteps and restarts, what I actually did is not really a sensible procedure. So here is a retcon of a process I think will work: Unlock the bootloader The E Foundation provide instructions for unlocking the bootloader on a stock FP4, here https://doc.e.foundation/devices/FP4/install and they seem applicable to the Murena phone supplied with Eos pre-installed, too. NB tht unlocking the bootloader wipes the phone. So we do it first. So:

Power on the phone, with no SIM installed
You get a welcome screen.
Skip all things on startup including wifi
Go to the very end of the settings, tap a gazillion times on the phone s version until you re a developer
In the developer settings, allow usb debugging
In the developer settings, allow oem bootloader unlocking
Connect a computer via a USB cable, say yes on phone to USB debugging
adb reboot bootloader
The phone will reboot into a texty kind of screen, the bootloader
fastboot flashing unlock
The phone will reboot, back to the welcome screen
Repeat steps 3-9 (maybe not all are necessary)
fastboot flashing unlock_critical
The phone will reboot, back to the welcome screen

Note that although you are running fastboot, you must run this command with the phone in bootloader mode, not fastboot (aka fastbootd ) mode. If you run fastboot flashing unlcok from fastboot you just get a don t know what you re talking about . I found conflicting instructions on what kind of Vulcan nerve pinches could be used to get into which boot modes, and had poor experiences with those. adb reboot bootloader always worked reliably for me. Some docs say to run fastboot oem unlock; I used flashing. Maybe this depends on the Android tools version. Initial privacy prep and OTA update We want to update the supplied phone OS. The build mine shipped with is too buggy to run Magisk, the application we are going to use to root the phone. (With the pre-installed phone OS, Magisk crashes at the patch boot image step.) But I didn t want to let the phone talk to Google, even for the push notifications registration.

From the welcome screen, skip all things except location, date, time. Notably, do not set up wifi
In settings, microg section
1. turn off cloud messaging
2. turn off google safetynet
3. turn off google registration (NB you must do this after the other two, because their sliders become dysfunctional after you turn google registration off)
4. turn off both location modules
In settings, location section, turn off allowed location for browser and magic earth
Now go into settings and enable wifi, giving it your wifi details
Tell the phone to update its operating system. This is a big download.

Install Magisk, the root manager (As a starting point I used these instructions https://www.xda-developers.com/how-to-install-magisk/ and a lot of random forum posts.) You will need the official boot.img. Bizarrely there doesn t seem to be a way to obtain this from the phone. Instead, you must download it. You can find it by starting at https://doc.e.foundation/devices/FP4/install which links to https://images.ecloud.global/stable/FP4/. At the time of writing, the most recent version, whose version number seemed to correspond to the OS update I installed above, was IMG-e-0.21-r-20220123158735-stable-FP4.zip.

Download the giant zipfile to your computer
Unzip it to extract boot.img
Copy the file to your phone s storage . Eg, via adb: with the phone booted into the main operating system, using USB debugging, adb push boot.img /storage/self/primary/Download.
On the phone, open the browser, and enter https://f-droid.org. Click on the link to install f-droid. You will need to enable installing apps from the browser (follow the provided flow to the settings, change the setting, and then use Back, and you can do the install). If you wish, you can download the f-droid apk separately on a computer, and verify it with pgp.
Using f-droid, install Magisk. You will need to enable installing apps from f-droid. (I installed Magisk from f-droid because 1. I was going to trust f-droid anyway 2. it has a shorter URL than Magisk s.)
Open the Magisk app. Tell Magisk to install (Magisk, not the app). There will be only one option: patch boot file. Tell it to patch the boot.img file from before.
Transfer the magisk_patched-THING.img back to your computer (eg via adb pull).
adb reboot bootloader
fastboot boot magisk_patched-THING.img (again, NB, from bootloader mode, not from fastboot mode)
In Magisk you ll see it shows as installed. But it s not really; you ve just booted from an image with it. Ask to install Magisk with Direct install .

After you have done all this, I believe that each time you do an over-the-air OS update, you must, between installing the update and rebooting the phone, ask Magisk to Install to inactive slot (after OTA) . Presumably if you forget you must do the fastboot boot dance again. After all this, I was able to use tsu in Termux. There s a strange behaviour with the root prompt you get apropos Termux s request for root; I found that it definitely worked if Termux wasn t the foreground app You have to leave the bootloader unlocked. Howwever, as I understand it, the phone s encryption will still prevent an attacker from hoovering the data out of your phone. The bootloader lock is to prevent someone tricking you into entering the decryption passkey into a trojaned device. Other things to change

Probably, after you re done with this, disable installing apps from the Browser. I will install Signal before doing that, since that s not in f-droid because of mutual distrust between the f-droid and Signal folks. The permission is called Install unknown apps .
Turn off instant apps aka open links in apps even if the app is not installed . OMG WTF BBQ.
Turn off wifi scanning even if wifi off . WTF.
I turned off storage manager auto delete, on the grounds that I didn t know what the phone might think of as having been backed up . I can manage my own space use, thanks very much.

There are probably other things to change. I have not yet transferred my Signal account from my old phone. It is possible that Signal will require me to re-enable the google push notifications, but I hope that having disabled them in microg it will be happy to use its own system, as it does on my old phone.

comments

4 February 2022

Ian Jackson: EUDCC QR codes vs NHS Travel barcodes vs TAC Verify

The EU Digital Covid Certificate scheme is a format for (digitally signed) vaccination status certificates. Not only EU countries participate - the UK is now a participant in this scheme. I am currently on my way to go skiing in the French Alps. So I needed a certificate that would be accepted in France. AFAICT the official way to do this is to get the international certificate from the NHS, and take it to a French pharmacy who will convert it into something suitably French. (AIUI the NHS international barcode is the same regardless of whether you get it via the NHS website, the NHS app, or a paper letter. NB that there is one barcode per vaccine dose so you have to get the right one - probably that means your booster since there s a 9 month rule!) I read on an forum somewhere that you could use the French TousAntiCovid app to convert the barcode. So I thought I would try that. The TousAntiCovid is Free Softare and on F-Droid, so I was happy to install and use it for this. I also used the French TAC Verify app to check to see what barcodes were accepted. (I found an official document addressed to French professionals recommending this as an option for verifying the status of visitors to one s establishment.) Unfortunately this involves a googlified phone, but one could use a burner phone or ask a friend who s bitten that bullet already. I discovered that, indeed:

My own NHS QR code is not accepted by TAC Verify
My own NHS QR code can be loaded into TousAntiCovid, and added to my wallet in the app
If I get TousAntiCovid to display that certificate, it shows a visually different QR code which TAC Verify accepts

This made me curious. I used a QR code reader to decode both barcodes. The decodings were identical! A long string of guff starting HC1:. AIUI it is an encoded JWT. But there was a difference in the framing: Binary Eye reported that the NHS barcode used error correction level M (medium, aka 15%). The TousAntiCovid barcode used level L (low, 7%). I had my QR code software regenerate a QR code at level M for the data from the TousAntiCovid code. The result was a QR code which is identical (pixel-wise) to the one from the NHS. So the only difference is the error correction level. Curiously, both L (low, generated by TousAntiCovid, accepted by TAC Verify) and M (medium, generated by NHS, rejected by TAC Verify) are lower than the Q (25") recommended by what I think is the specification. This is all very odd. But the upshot is that I think you can convert the NHS international barcode into something that should work in France simply by passing it through any QR code software to re-encode it at error correction level L (7%). But if you re happy to use the TousAntiCovid app it s probably a good way to store them. I guess I ll find out when I get to France if the converted NHS barcodes work in real establishments. Thanks to the folks behind sanipasse.fr for publishing some helpful backround info and operating a Free Software backed public verification service. Footnote To compare the QR codes pixelwise, I roughly cropped the NHS PDF image using a GUI tool, and then on each of the two images used pnmcrop (to trim the border), pnmscale (to rescale the one-pixel-per-pixel output from Binary Eye) and pnmarith -difference to compare them (producing a pretty squirgly image showing just the pixel edges due to antialiasing).

comments

3 January 2022

Ian Jackson: Debian s approach to Rust - Dependency handling

tl;dr: Faithfully following upstream semver, in Debian package dependencies, is a bad idea. Introduction I have been involved in Debian for a very long time. And I ve been working with Rust for a few years now. Late last year I had cause to try to work on Rust things within Debian. When I did, I found it very difficult. The Debian Rust Team were very helpful. However, the workflow and tooling require very large amounts of manual clerical work - work which it is almost impossible to do correctly since the information required does not exist. I had wanted to package a fairly straightforward program I had written in Rust, partly as a learning exercise. But, unfortunately, after I got stuck in, it looked to me like the effort would be wildly greater than I was prepared for, so I gave up. Since then I ve been thinking about what I learned about how Rust is packaged in Debian. I think I can see how to fix some of the problems. Although I don t want to go charging in and try to tell everyone how to do things, I felt I ought at least to write up my ideas. Hence this blog post, which may become the first of a series. This post is going to be about semver handling. I see problems with other aspects of dependency handling and source code management and traceability as well, and of course if my ideas find favour in principle, there are a lot of details that need to be worked out, including some kind of transition plan. How Debian packages Rust, and build vs runtime dependencies Today I will be discussing almost entirely build-dependencies; Rust doesn t (yet?) support dynamic linking, so built Rust binaries don t have Rusty dependencies. However, things are a bit confusing because even the Debian binary packages for Rust libraries contain pure source code. So for a Rust library package, building the Debian binary package from the Debian source package does not involve running the Rust compiler; it s just file-copying and format conversion. The library s Rust dependencies do not need to be installed on the build machine for this. So I m mostly going to be talking about Depends fields, which are Debian s way of talking about runtime dependencies, even though they are used only at build-time. The way this works is that some ultimate leaf package (which is supposed to produce actual executable code) Build-Depends on the libraries it needs, and those Depends on their under-libraries, so that everything needed is installed. What do dependencies mean and what are they for anyway? In systems where packages declare dependencies on other packages, it generally becomes necessary to support versioned dependencies. In all but the most simple systems, this involves an ordering (or similar) on version numbers and a way for a package A to specify that it depends on certain versions of B. Both Debian and Rust have this. Rust upstream crates have version numbers and can specify their dependencies according to semver. Debian s dependency system can represent that. So it was natural for the designers of the scheme for packaging Rust code in Debian to simply translate the Rust version dependencies to Debian ones. However, while the two dependency schemes seem equivalent in the abstract, their concrete real-world semantics are totally different. These different package management systems have different practices and different meanings for dependencies. (Interestingly, the Python world also has debates about the meaning and proper use of dependency versions.) The epistemological problem Consider some package A which is known to depend on B. In general, it is not trivial to know which versions of B will be satisfactory. I.e., whether a new B, with potentially-breaking changes, will actually break A. Sometimes tooling can be used which calculates this (eg, the Debian shlibdeps system for runtime dependencies) but this is unusual - especially for build-time dependencies. Which versions of B are OK can normally only be discovered by a human consideration of changelogs etc., or by having a computer try particular combinations. Few ecosystems with dependencies, in the Free Software community at least, make an attempt to precisely calculate the versions of B that are actually required to build some A. So it turns out that there are three cases for a particular combination of A and B: it is believed to work; it is known not to work; and: it is not known whether it will work. And, I am not aware of any dependency system that has an explicit machine-readable representation for the unknown state, so that they can say something like A is known to depend on B; versions of B before v1 are known to break; version v2 is known to work . (Sometimes statements like that can be found in human-readable docs.) That leaves two possibilities for the semantics of a dependency A depends B, version(s) V..W: Precise: A will definitely work if B matches V..W, and Optimistic: We have no reason to think B breaks with any of V..W. At first sight the latter does not seem useful, since how would the package manager find a working combination? Taking Debian as an example, which uses optimistic version dependencies, the answer is as follows: The primary information about what package versions to use is not only the dependencies, but mostly in which Debian release is being targeted. (Other systems using optimistic version dependencies could use the date of the build, i.e. use only packages that are current .)

	Precise	Optimistic
People involved in version management	Package developers, downstream developers/users.	Package developers, downstream developer/users, distribution QA and release managers.
Package developers declare versions V and dependency ranges V..W so that	It definitely works.	A wide range of B can satisfy the declared requirement.
The principal version data used by the package manager	Only dependency versions.	Contextual, eg, Releases - set(s) of packages available.
Version dependencies are for	Selecting working combinations (out of all that ever existed).	Sequencing (ordering) of updates; QA.
Expected use pattern by a downstream	Downstream can combine any declared-good combination.	Use a particular release of the whole system. Mixing-and-matching requires additional QA and remedial work.
Downstreams are protected from breakage by	Pessimistically updating versions and dependencies whenever anything might go wrong.	Whole-release QA.
A substantial deployment will typically contain	Multiple versions of many packages.	A single version of each package, except where there are actual incompatibilities which are too hard to fix.
Package updates are driven by	Top-down: Depending package updates the declared metadata.	Bottom-up: Depended-on package is updated in the repository for the work-in-progress release.

So, while Rust and Debian have systems that look superficially similar, they contain fundamentally different kinds of information. Simply representing the Rust versions directly into Debian doesn t work. What is currently done by the Debian Rust Team is to manually patch the dependency specifications, to relax them. This is very labour-intensive, and there is little automation supporting either decisionmaking or actually applying the resulting changes. What to do Desired end goal To update a Rust package in Debian, that many things depend on, one need simply update that package. Debian s sophisticated build and CI infrastructure will try building all the reverse-dependencies against the new version. Packages that actually fail against the new dependency are flagged as suffering from release-critical problems. Debian Rust developers then update those other packages too. If the problems turn out to be too difficult, it is possible to roll back. If a problem with a depending packages is not resolved in a timely fashion, priority is given to updating core packages, and the depending package falls by the wayside (since it is empirically unmaintainable, given available effort). There is no routine manual patching of dependency metadata (or of anything else). Radical proposal Debian should not precisely follow upstream Rust semver dependency information. Instead, Debian should optimistically try the combinations of packages that we want to have. The resulting breakages will be discovered by automated QA; they will have to be fixed by manual intervention of some kind, but usually, simply updating the depending package will be sufficient. This no longer ensures (unlike the upstream Rust scheme) that the result is expected to build and work if the dependencies are satisfied. But as discussed, we don t really need that property in Debian. More important is the new property we gain: that we are able to mix and match versions that we find work in practice, without a great deal of manual effort. Or to put it another way, in Debian we should do as a Rust upstream maintainer does when they do the regular update dependencies for new semvers task: we should update everything, see what breaks, and fix those. (In theory a Rust upstream package maintainer is supposed to do some additional checks or something. But the practices are not standardised and any checks one does almost never reveal anything untoward, so in practice I think many Rust upstreams just update and see what happens. The Rust upstream community has other mechanisms - often, reactive ones - to deal with any problems. Debian should subscribe to those same information sources, eg RustSec.) Nobbling cargo Somehow, when cargo is run to build Rust things against these Debian packages, cargo s dependency system will have to be overridden so that the version of the package that is actually selected by Debian s package manager is used by cargo without complaint. We probably don t want to change the Rust version numbers of Debian Rust library packages, so this should be done by either presenting cargo with an automatically-massaged Cargo.toml where the dependency version restrictions are relaxed, or by using a modified version of cargo which has special option(s) to relax certain dependencies. Handling breakage Rust packages in Debian should already be provided with autopkgtests so that ci.debian.net will detect build breakages. Build breakages will stop the updated dependency from migrating to the work-in-progress release, Debian testing. To resolve this, and allow forward progress, we will usually upload a new version of the dependency containing an appropriate Breaks, and either file an RC bug against the depending package, or update it. This can be done after the upload of the base package. Thus, resolution of breakage due to incompatibilities will be done collaboratively within the Debian archive, rather than ad-hoc locally. And it can be done without blocking. My proposal prioritises the ability to make progress in the core, over stability and in particular over retaining leaf packages. This is not Debian s usual approach but given the Rust ecosystem s practical attitudes to API design, versioning, etc., I think the instability will be manageable. In practice fixing leaf packages is not usually really that hard, but it s still work and the question is what happens if the work doesn t get done. After all we are always a shortage of effort - and we probably still will be, even if we get rid of the makework clerical work of patching dependency versions everywhere (so that usually no work is needed on depending packages). Exceptions to the one-version rule There will have to be some packages that we need to keep multiple versions of. We won t want to update every depending package manually when this happens. Instead, we ll probably want to set a version number split: rdepends which want version <X will get the old one. Details - a sketch I m going to sketch out some of the details of a scheme I think would work. But I haven t thought this through fully. This is still mostly at the handwaving stage. If my ideas find favour, we ll have to do some detailed review and consider a whole bunch of edge cases I m glossing over. The dependency specification consists of two halves: the depending .deb s Depends (or, for a leaf package, Build-Depends) and the base .deb Version and perhaps Breaks and Provides. Even though libraries vastly outnumber leaf packages, we still want to avoid updating leaf Debian source packages simply to bump dependencies. Dependency encoding proposal Compared to the existing scheme, I suggest we implement the dependency relaxation by changing the depended-on package, rather than the depending one. So we retain roughly the existing semver translation for Depends fields. But we drop all local patching of dependency versions. Into every library source package we insert a new Debian-specific metadata file declaring the earliest version that we uploaded. When we translate a library source package to a .deb, the binary package build adds Provides for every previous version. The effect is that when one updates a base package, the usual behaviour is to simply try to use it to satisfy everything that depends on that base package. The Debian CI will report the build or test failures of all the depending packages which the API changes broke. We will have a choice, then: Breakage handling - update broken depending packages individually If there are only a few packages that are broken, for each broken dependency, we add an appropriate Breaks to the base binary package. (The version field in the Breaks should be chosen narrowly, so that it is possible to resolve it without changing the major version of the dependency, eg by making a minor source change.) When can then do one of the following:

Update the dependency from upstream, to a version which works with the new base. (Assuming there is one.) This should be the usual response.
Fix the dependency source code so that builds and works with the new base package. If this wasn t just a backport of an upstream change, we should send our fix upstream. (We should prefer to update the whole package, than to backport an API adjustment.)
File an RC bug against the dependency (which will eventually trigger autoremoval), or preemptively ask for the Debian release managers to remove the dependency from the work-in-progress release.

Breakage handling - declare new incompatible API in Debian If the API changes are widespread and many dependencies are affected, we should represent this by changing the in-Debian-source-package metadata to arrange for fewer Provides lines to be generated - withdrawing the Provides lines for earlier APIs. Hopefully examination of the upstream changelog will show what the main compat break is, and therefore tell us which Provides we still want to retain. This is like declaring Breaks for all the rdepends. We should do it if many rdepends are affected. Then, for each rdependency, we must choose one of the responses in the bullet points above. In practice this will often be a mass bug filing campaign, or large update campaign. Breakage handling - multiple versions Sometimes there will be a big API rewrite in some package, and we can t easily update all of the rdependencies because the upstream ecosystem is fragmented and the work involved in reconciling it all is too substantial. When this happens we will bite the bullet and include multiple versions of the base package in Debian. The old version will become a new source package with a version number in its name. This is analogous to how key C/C++ libraries are handled. Downsides of this scheme The first obvious downside is that assembling some arbitrary set of Debian Rust library packages, that satisfy the dependencies declared by Debian, is no longer necessarily going to work. The combinations that Debian has tested - Debian releases - will work, though. And at least, any breakage will affect only people building Rust code using Debian-supplied libraries. Another less obvious problem is that because there is no such thing as Build-Breaks (in a Debian binary package), the per-package update scheme may result in no way to declare that a particular library update breaks the build of a particular leaf package. In other words, old source packages might no longer build when exposed to newer versions of their build-dependencies, taken from a newer Debian release. This is a thing that already happens in Debian, with source packages in other languages, though. Semver violation I am proposing that Debian should routinely compile Rust packages against dependencies in violation of the declared semver, and ship the results to Debian s millions of users. This sounds quite alarming! But I think it will not in fact lead to shipping bad binaries, for the following reasons: The Rust community strongly values safety (in a broad sense) in its APIs. An API which is merely capable of insecure (or other seriously bad) use is generally considered to be wrong. For example, such situations are regarded as vulnerabilities by the RustSec project, even if there is no suggestion that any actually-broken caller source code exists, let alone that actually-broken compiled code is likely. The Rust community also values alerting programmers to problems. Nontrivial semantic changes to APIs are typically accompanied not merely by a semver bump, but also by changes to names or types, precisely to ensure that broken combinations of code do not compile. Or to look at it another way, in Debian we would simply be doing what many Rust upstream developers routinely do: bump the versions of their dependencies, and throw it at the wall and hope it sticks. We can mitigate the risks the same way a Rust upstream maintainer would: when updating a package we should of course review the upstream changelog for any gotchas. We should look at RustSec and other upstream ecosystem tracking and authorship information. Difficulties for another day As I said, I see some other issues with Rust in Debian.

I think the library feature flag encoding scheme is unnecessary. I hope to explain this in a future essay.
I found Debian s approach to handling the source code for its Rust packages quite awkward; and, it has some troubling properties. Again, I hope to write about this later.
I get the impression that updating rustc in Debian is a very difficult process. I haven t worked on this myself and I don t feel qualified to have opinions about it. I hope others are thinking about how to make things easier.

Thanks all for your attention!

comments

20 October 2021

Ian Jackson: Going to work for the Tor Project

I have accepted a job with the Tor Project. I joined XenSource to work on Xen in late 2007, as XenSource was being acquired by Citrix. So I have been at Citrix for about 14 years. I have really enjoyed working on Xen. There have been a variety of great people. I'm very proud of some of the things we built and achieved. I'm particularly proud of being part of a community that has provided the space for some of my excellent colleagues to really grow. But the opportunity to go and work on a project I am so ideologically aligned with, and to work with Rust full-time, is too good to resist. That Tor is mostly written in C is quite terrifying, and I'm very happy that I'm going to help fix that!

comments

29 September 2021

Ian Jackson: Rust for the Polyglot Programmer

Rust is definitely in the news. I'm definitely on the bandwagon. (To me it feels like I've been wanting something like Rust for many years.) There're a huge number of intro tutorials, and of course there's the Rust Book. A friend observed to me, though, that while there's a lot of "write your first simple Rust program" there's a dearth of material aimed at the programmer who already knows a dozen diverse languages, and is familiar with computer architecture, basic type theory, and so on. Or indeed, for the impatient and confident reader more generally. I thought I would have a go. Rust for the Polyglot Programmer is the result. Compared to much other information about Rust, Rust for the Polyglot Programmer is:

Dense: I assume a lot of starting knowledge. Or to look at it another way: I expect my reader to be able to look up and digest non-Rust-specific words or concepts.
Broad: I cover not just the language and tools, but also the library ecosystem, development approach, community ideology, and so on.
Frank: much material about Rust has a tendency to gloss over or minimise the bad parts. I don't do that. That also frees me to talk about strategies for dealing with the bad parts.
Non-neutral: I'm not afraid to recommend particular libraries, for example. I'm not afraid to extol Rust's virtues in the areas where it does well.
Terse, and sometimes shallow: I often gloss over what I see as unimportant or fiddly details; instead I provide links to appropriate reference materials.

After reading Rust for the Polyglot Programmer, you won't know everything you need to know to use Rust for any project, but should know where to find it. Thanks are due to Simon Tatham, Mark Wooding, Daniel Silverstone, and others, for encouragement, and helpful reviews including important corrections. Particular thanks to Mark Wooding for wrestling pandoc and LaTeX into producing a pretty good-looking PDF. Remaining errors are, of course, mine. Comments are welcome of course, via the Dreamwidth comments or Salsa issue or MR. (If you're making a contribution, please indicate your agreement with the Developer Certificate of Origin.)

edited 2021-09-29 16:58 UTC to fix Salsa link targe, and 17:01 and 17:21 to for minor grammar fixes

comments

22 September 2021

Ian Jackson: Tricky compatibility issue - Rust's io::ErrorKind

This post is about some changes recently made to Rust's ErrorKind, which aims to categorise OS errors in a portable way. Audiences for this post

The educated general reader interested in a case study involving error handling, stability, API design, and/or Rust.
Rust users who have tripped over these changes. If this is you, you can cut to the chase and skip to How to fix.

Background and context Error handling principles Handling different errors differently is often important (although, sadly, often neglected). For example, if a program tries to read its default configuration file, and gets a "file not found" error, it can proceed with its default configuration, knowing that the user hasn't provided a specific config. If it gets some other error, it should probably complain and quit, printing the message from the error (and the filename). Otherwise, if the network fileserver is down (say), the program might erroneously run with the default configuration and do something entirely wrong. Rust's portability aims The Rust programming language tries to make it straightforward to write portable code. Portable error handling is always a bit tricky. One of Rust's facilities in this area is std::io::ErrorKind which is an enum which tries to categorise (and, sometimes, enumerate) OS errors. The idea is that a program can check the error kind, and handle the error accordingly. That these ErrorKinds are part of the Rust standard library means that to get this right, you don't need to delve down and get the actual underlying operating system error number, and write separate code for each platform you want to support. You can check whether the error is ErrorKind::NotFound (or whatever). Because ErrorKind is so important in many Rust APIs, some code which isn't really doing an OS call can still have to provide an ErrorKind. For this purpose, Rust provides a special category ErrorKind::Other, which doesn't correspond to any particular OS error. Rust's stability aims and approach Another thing Rust tries to do is keep existing code working. More specifically, Rust tries to:

Avoid making changes which would contradict the previously-published documentation of Rust's language and features.
Tell you if you accidentally rely on properties which are not part of the published documentation.

By and large, this has been very successful. It means that if you write code now, and it compiles and runs cleanly, it is quite likely that it will continue work properly in the future, even as the language and ecosystem evolves. This blog post is about a case where Rust failed to do (2), above, and, sadly, it turned out that several people had accidentally relied on something the Rust project definitely intended to change. Furthermore, it was something which needed to change. And the new (corrected) way of using the API is not so obvious. Rust enums, as relevant to io::ErrorKind (Very briefly:) When you have a value which is an io::ErrorKind, you can compare it with specific values:

    if error.kind() == ErrorKind::NotFound   ...

But in Rust it's more usual to write something like this (which you can read like a switch statement):

    match error.kind()  
      ErrorKind::NotFound => use_default_configuration(),
      _ => panic!("could not read config file  :  ", &file, &error),

Here _ means "anything else". Rust insists that match statements are exhaustive, meaning that each one covers all the possibilities. So if you left out the line with the _, it wouldn't compile. Rust enums can also be marked non_exhaustive, which is a declaration by the API designer that they plan to add more kinds. This has been done for ErrorKind, so the _ is mandatory, even if you write out all the possibilities that exist right now: this ensures that if new ErrorKinds appear, they won't stop your code compiling. Improving the error categorisation The set of error categories stabilised in Rust 1.0 was too small. It missed many important kinds of error. This makes writing error-handling code awkward. In any case, we expect to add new error categories occasionally. I set about trying to improve this by proposing new ErrorKinds. This obviously needed considerable community review, which is why it took about 9 months. The trouble with Other and tests Rust has to assign an ErrorKind to every OS error, even ones it doesn't really know about. Until recently, it mapped all errors it didn't understand to ErrorKind::Other - reusing the category for "not an OS error at all". Serious people who write serious code like to have serious tests. In particular, testing error conditions is really important. For example, you might want to test your program's handling of disk full, to make sure it didn't crash, or corrupt files. You would set up some contraption that would simulate a full disk. And then, in your tests, you might check that the error was correct. But until very recently (still now, in Stable Rust), there was no ErrorKind::StorageFull. You would get ErrorKind::Other. If you were diligent you would dig out the OS error code (and check for ENOSPC on Unix, corresponding Windows errors, etc.). But that's tiresome. The more obvious thing to do is to check that the kind is Other. Obvious but wrong. ErrorKind is non_exhaustive, implying that more error kinds will appears, and, naturally, these would more finely categorise previously-Other OS errors. Unfortunately, the documentation note

Errors that are Other now may move to a different or a new ErrorKind variant in the future.

was only added in May 2020. So the wrongness of the "obvious" approach was, itself, not very obvious. And even with that docs note, there was no compiler warning or anything. The unfortunate result is that there is a body of code out there in the world which might break any time an error that was previously Other becomes properly categorised. Furthermore, there was nothing stopping new people writing new obvious-but-wrong code. Chosen solution: Uncategorized The Rust developers wanted an engineered safeguard against the bug of assuming that a particular error shows up as Other. They chose the following solution: There is now a new ErrorKind::Uncategorized which is now used for all OS errors for which there isn't a more specific categorisation. The fallback translation of unknown errors was changed from Other to Uncategorised. This is de jure justified by the fact that this enum has always been marked non_exhaustive. But in practice because this bug wasn't previously detected, there is such code in the wild. That code now breaks (usually, in the form of failing test cases). Usually when Rust starts to detect a particular programming error, it is reported as a new warning, which doesn't break anything. But that's not possible here, because this is a behavioural change. The new ErrorKind::Uncategorized is marked unstable. This makes it impossible to write code on Stable Rust which insists that an error comes out as Uncategorized. So, one cannot now write code that will break when new ErrorKinds are added. That's the intended effect. The downside is that this does break old code, and, worse, it is not as clear as it should be what the fixed code looks like. Alternatives considered and rejected by the Rust developers Not adding more ErrorKinds This was not tenable. The existing set is already too small, and error categorisation is in any case expected to improve over time. Just adding ErrorKinds as had been done before This would mean occasionally breaking test cases (or, possibly, production code) when an error that was previously Other becomes categorised. The broken code would have been "obvious", but de jure wrong, just as it is now, So this option amounts to expecting this broken code to continue to be written and continuing to break it occasionally. Somehow using Rust's Edition system The Rust language has a system to allow language evolution, where code declares its Edition (2015, 2018, 2021). Code from multiple editions can be combined, so that the ecosystem can upgrade gradually. It's not clear how this could be used for ErrorKind, though. Errors have to be passed between code with different editions. If those different editions had different categorisations, the resulting programs would have incoherent and broken error handling. Also some of the schemes for making this change would mean that new ErrorKinds could only be stabilised about once every 3 years, which is far too slow. How to fix code broken by this change Most main-line error handling code already has a fallback case for unknown errors. Simply replacing any occurrence of Other with _ is right. How to fix thorough tests The tricky problem is tests. Typically, a thorough test case wants to check that the error is "precisely as expected" (as far as the test can tell). Now that unknown errors come out as an unstable Uncategorized variant that's not so easy. If the test is expecting an error that is currently not categorised, you want to write code that says "if the error is any of the recognised kinds, call it a test failure". What does "any of the recognised kinds" mean here ? It doesn't meany any of the kinds recognised by the version of the Rust stdlib that is actually in use. That set might get bigger. When the test is compiled and run later, perhaps years later, the error in this test case might indeed be categorised. What you actually mean is "the error must not be any of the kinds which existed when the test was written". IMO therefore the right solution for such a test case is to cut and paste the current list of stable ErrorKinds into your code. This will seem wrong at first glance, because the list in your code and in Rust can get out of step. But when they do get out of step you want your version, not the stdlib's. So freezing the list at a point in time is precisely right. You probably only want to maintain one copy of this list, so put it somewhere central in your codebase's test support machinery. Periodically, you can update the list deliberately - and fix any resulting test failures. Unfortunately this approach is not suggested by the documentation. In theory you could work all this out yourself from first principles, given even the situation prior to May 2020, but it seems unlikely that many people have done so. In particular, cutting and pasting the list of recognised errors would seem very unnatural. Conclusions This was not an easy problem to solve well. I think Rust has done a plausible job given the various constraints, and the result is technically good. It is a shame that this change to make the error handling stability more correct caused the most trouble for the most careful people who write the most thorough tests. I also think the docs could be improved.

edited shortly after posting, and again 2021-09-22 16:11 UTC, to fix HTML slips

comments

15 September 2021

Ian Jackson: Get source to Debian packages only via dgit; "official" git links are beartraps

tl;dr dgit clone sourcepackage gets you the source code, as a git tree, in ./sourcepackage. cd into it and dpkg-buildpackage -uc -b. Do not use: "VCS" links on official Debian web pages like tracker.debian.org; "debcheckout"; searching Debian's gitlab (salsa.debian.org). These are good for Debian experts only. If you use Debian's "official" source git repo links you can easily build a package without Debian's patches applied.[1] This can even mean missing security patches. Or maybe it can't even be built in a normal way (or at all). OMG WTF BBQ, why? It's complicated. There is History. Debian's "most-official" centralised source repository is still the Debian Archive, which is a system based on tarballs and patches. I invented the Debian source package format in 1992/3 and it has been souped up since, but it's still tarballs and patches. This system is, of course, obsolete, now that we have modern version control systems, especially git. Maintainers of Debian packages have invented ways of using git anyway, of course. But this is not standardised. There is a bewildering array of approaches. The most common approach is to maintain git tree containing a pile of *.patch files, which are then often maintained using quilt. Yes, really, many Debian people are still using quilt, despite having git! There is machinery for converting this git tree containing a series of patches, to an "official" source package. If you don't use that machinery, and just build from git, nothing applies the patches. [1] This post was prompted by a conversation with a friend who had wanted to build a Debian package, and didn't know to use dgit. They had got the source from salsa via a link on tracker.d.o, and built .debs without Debian's patches. This not a theoretical unsoundness, but a very real practical risk. Future is not very bright In 2013 at the Debconf in Vaumarcus, Joey Hess, myself, and others, came up with a plan to try to improve this which we thought would be deployable. (Previous attempts had failed.) Crucially, this transition plan does not force change onto any of Debian's many packaging teams, nor onto people doing cross-package maintenance work. I worked on this for quite a while, and at a technical level it is a resounding success. Unfortunately there is a big limitation. At the current stage of the transition, to work at its best, this replacement scheme hopes that maintainers who update a package will use a new upload tool. The new tool fits into their existing Debian git packaging workflow and has some benefits, but it does make things more complicated rather than less (like any transition plan must, during the transitional phase). When maintainers don't use this new tool, the standardised git branch seen by users is a compatibility stub generated from the tarballs-and-patches. So it has the right contents, but useless history. The next step is to allow a maintainer to update a package without dealing with tarballs-and-patches at all. This would be massively more convenient for the maintainer, so an easy sell. And of course it links the tarballs-and-patches to the git history in a proper machine-readable way. We held a "git packaging requirements-gathering session" at the Curitiba Debconf in 2019. I think the DPL's intent was to try to get input into the git workflow design problem. The session was a great success: my existing design was able to meet nearly everyone's needs and wants. The room was obviously keen to see progress. The next stage was to deploy tag2upload. I spoke to various key people at the Debconf and afterwards in 2019 and the code has been basically ready since then. Unfortunately, deployment of tag2upload is mired in politics. It was blocked by a key team because of unfounded security concerns; positive opinions from independent security experts within Debian were disregarded. Of course it is always hard to get a team to agree to something when it's part of a transition plan which treats their systems as an obsolete setup retained for compatibility. Current status If you don't know about Debian's git packaging practices (eg, you have no idea what "patches-unapplied packaging branch without .pc directory" means), and don't want want to learn about them, you must use dgit to obtain the source of Debian packages. There is a lot more information and detailed instructions in dgit-user(7). Hopefully either the maintainer did the best thing, or, if they didn't, you won't need to inspect the history. If you are a Debian maintainer, you should use dgit push-source to do your uploads. This will make sure that users of dgit will see a reasonable git history.

edited 2021-09-15 14:48 Z to fix a typo

comments

8 September 2021

Ian Jackson: Wanted: Rust sync web framework

tl;dr: Please recommend me a high-level Rust server-side web framework which is sync and does not plan to move to an async api. Why Async Rust gives somewhat higher performance. But it is considerably less convenient and ergonomic than using threads for concurrency. Much work is ongoing to improve matters, but I doubt async Rust will ever be as easy to write as sync Rust. "Even" sync multithreaded Rust is very fast (and light on memory use) compared to many things people write web apps in. The vast majority of web applications do not need the additional performance (which is typically a percentage, not a factor). So it is rather disappointing to find that all the review articles I read, and all the web framework authors, seem to have assumed that async is the inevitable future. There should be room for both sync and async. Please let universal use of async not be the inevitable future! What I would like a web framework that provides a sync API (something like Rocket 0.4's API would be ideal) and will remain sync. It should probably use (async) hyper underneath. So far I have not found one single web framework on crates.io that neither is already async nor suggests that its authors intend to move to an async API. Some review articles I found even excluded sync frameworks entirely! Answers in the comments please :-).

comments

Next.

Previous.